Distributed learning with bagging-like performance

نویسندگان

  • Nitesh V. Chawla
  • Thomas E. Moore
  • Lawrence O. Hall
  • Kevin W. Bowyer
  • W. Philip Kegelmeyer
  • Clayton Springer
چکیده

Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments with decision tree and neural network classifiers on various datasets show that, given the same size partitions and bags, disjoint partitions result in performance equivalent to, or better than, bootstrap aggregates (bags). Many applications (e.g., protein structure prediction) involve use of datasets that are too large to handle in the memory of the typical computer. Hence, bagging with samples the size of the data is impractical. Our results indicate that, in such applications, the simple approach of creating a committee of n classifiers from disjoint partitions each of size 1=n (which will be memory resident during learning) in a distributed way results in a classifier which has a bagging-like performance gain. The use of distributed disjoint partitions in learning is significantly less complex and faster than bagging. 2002 Elsevier Science B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed learning with bagging - like performance 3

10 Bagging forms a committee of classifiers by bootstrap aggregation of training sets from a pool of training data. A 11 simple alternative to bagging is to partition the data into disjoint subsets. Experiments with decision tree and neural 12 network classifiers on various datasets show that, given the same size partitions and bags, disjoint partitions result in 13 performance equivalent to, o...

متن کامل

Generating Classifier Commitees by Stochastically Selecting both Attributes and Training Examples

Boosting and Bagging, as two representative approaches to learning classiier committees, have demonstrated great success, especially for decision tree learning. They repeatedly build diierent classiiers using a base learning algorithm by changing the distribution of the training set. Sasc, as a diierent type of committee learning method, can also signiicantly reduce the error rate of decision t...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Machine Learning Ensembles: An Empirical Study and Novel Approach

Two learning ensemble methods, Bagging and Boosting, have been applied to decision trees to improve classification accuracy over that of a single decision tree learner. We introduce Bagging and propose a variant of it — Improved Bagging — which, in general, outperforms the original bagging algorithm. We experiment on 22 datasets from the UCI repository, with emphasis on the ensemble’s accuracy ...

متن کامل

Stochastic Attribute Selection Committees

Classi er committee learning methods generate multiple classi ers to form a committee by repeated application of a single base learning algorithm. The committee members vote to decide the nal classication. Two such methods, Bagging and Boosting, have shown great success with decision tree learning. They create di erent classi ers by modifying the distribution of the training set. This paper stu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2003